Why Corpus-Based Statistics-Oriented Machine Translation
نویسنده
چکیده
Rule-based approaches have been the dominant paradigm in developing MT systems. Such approaches, however, suffer from difficulties in knowledge acquisition to meet the wide variety and time-changing characteristics of the real text. To attack this problem, some statistical translation models and supporting tools had been developed in the last few years. However, a simple statistical model often results in a large parameter space and thus requires a large training corpus. Therefore, it is required to introduce language models that take advantages of well-justified linguistic knowledge to make stochastic MT systems practical. A stochastic model that emphasizes the adoption of well-justified linguistic knowledge in developing the model is called a corpus-based statistics-oriented approach. In this paper, corpus-based statistics-oriented paradigm is proposed, its characteristics is compared with other methodologies. The recent progress in some corpus-based statistics-oriented models for MT are also reviewed.
منابع مشابه
Corpus-Based Statistics-Oriented (CBSO) Machine Translation Researches in Taiwan
A brief introduction to the MT research projects in Taiwan is given in this paper. Special attention is given to the more and more popular corpus-based statistics-oriented (CBSO) approaches in MT researches. In particular, the parameterized two-way training philosophy in designing the second generation BehaviorTran, which is the first and the largest operational system in this area, is introduc...
متن کاملImproving the precision of automatically constructed human-oriented translation dictionaries
In this paper we address the problem of automatic acquisition of a human-oriented translation dictionary from a large-scale parallel corpus. The initial translation equivalents can be extracted with the help of the techniques and tools developed for the phrase-table construction in statistical machine translation. The acquired translation equivalents usually provide good lexicon coverage, but t...
متن کاملSub-Sentential Alignment Method by Analogy
This paper describes a method for searching word correspondences between pairs of translation sentences. In the Example-Based Machine Translation, translation patterns can be extracted easily if word correspondences between pair of translation sentences are defined. The popular methods for aligning bilingual corpus at a sub-sentential level are unable to produce reliable result when the size of...
متن کاملTranslation Ambiguity Resolution Based On Text Corpora Of Source And Target Languages
We propose a new method to resolve ambiguity in translation and meaning interpretation using linguistic statistics extracted from dual corpora of sourcu aud target languages in addition to tim logical restrictions described on dictiomtry and grammar rules for ambiguity resolution. It provides reasonable criteria for determining a suitable equivalent translation or meaning by making tile depende...
متن کاملHow to Avoid Burning Ducks: How to Avoid Burning Ducks: Combining Linguistic Analysis and Corpus Statistics for German Compound Processing
Compound splitting is an important problem in many NLP applications which must be solved in order to address issues of data sparsity. Previous work has shown that linguistic approaches for German compound splitting produce a correct splitting more often, but corpus-driven approaches work best for phrase-based statistical machine translation from German to English, a worrisome contradiction. We ...
متن کامل